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I . Introduction 

Problems of controlling systems under uncertainty have long attracted the 
attention of many control theorists and engineers because of their importance in 
practical control systems. Since the work of Bellman [l], the stochastic adap- 
tive control approach has been useful for treating such problems ([2]; also see 
[ 3 ] for a survey). For state space models, the optimization approach for sto- 
chastic adaptive control ha^ been studied extensively. However, explicit solu- 
tions have been obtained for only a limited class of problems, for example, the 
well known certainty equivalence solution of the standard linear quadratic 
gaussian problem. Although more general problems have been conceptually solved 
(i.e., requiring formal solutions of functional equations), explicit forms of 
the optimal control laws (if they exist) have yet to be obtained. In order to 
overcome the difficulties in solving the functional equations, many suboptlmal 
schemes have been proposed [3] . Most of them incorporate approximations for 
some features of adaptive control. However, except for the ad hoc scheme where 
the certainty equivalence principle is enforced (this scheme will be called the 
CE law), they usually require a considerable amount of on-line computation, 
which can often be prohibitive. For example, the control law based on the dual 
control approach in [4], which exhibits an active learning property, requires 
extensive on-line computation to evaluate future observation programs. The open 
loop optimal feedback control law (OLOF) ignores future measurements but incor- 
porates some information concerning the uncertainty (covariances of estimation 
errors) in its control algorithm [5 - 7]. In this sense, this scheme was called 
"cautious" in [3], The OLOF law still requires numerical optimization techniques 
on-line . 

The purpose of this study is to investigate two suboptimal schemes which 
require little on-line computation but incorporate the effects of estimation 
errors in their control laws, and to study the performance of these laws by Monte 
Carlo simulations on a computer. We consider discrete-time linear stochastic 
systems with unknown control gain parameters (essentially the same class of prob- 
lems as that treated in [6]). Admittedly, this class of systems is small in 
practice. However, we believe that because of their conceptual simplicity and 
computational efficiency, the two laws derived in this report may provide a 
suitable framework for treating the more general problem, i.e., when the system 
state and control gain matrices are both unknown. 

One of the two control laws is based on underestimating future jcontrol, 
hence called the UEFC law, and the other is based on overestimating future e°n~ 
trol, the OEFC law. Two single-input, third order systems (one stable and the 
other unstable) are simulated, and the performance of the UEFC and OEFC laws is 
compared with that of the CE law and the law where the control gain parameters 
are known. The sensitivity of the performance of the four laws is studied for 
various levels of initial uncertainties in the states and the control gain param- 
eters . 

This report is organized as follows; Section II defines the notations. A 
precise definition of the problem is given in Section III. Section IV presents 
the results of the application of Kalman filter theory for the optimal estima- 
tion problem. We derive the UEFC and OEFC laws in Section V, and Section VI shows 
the results of the Monte Carlo simulations. Section VII concludes with remarks on 
this study. 
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II. Notations 


The transpose of a matrix X (vector x) is denoted by (x"^) . The trace 

of a square matrix X is denoted by tr(X). The matrices I„ and denote 

the n-dimensional identity matrix and the mxn null matrix, respectively; the 
subscripts will be dropped when there is no ambiguity. The notation X>0 
(X 2: 0) denotes a positive definite (semidefinite) matrix X, and X>Y (X>Y) 
implies X-Y>0 (X-YSO). The Kronecker product of matrices X and Y is 
denoted by X‘9Y. The mn dimensional row and column string vectors of an 
mxn matrix X are denoted by rs(X) and cs(X) ; i.e., 

[rs(X)l^ = 

lcs(X)]^ = ■ ■ ■ ’‘cn' 

T 

where x^^ ^^Ci^ i~th row (column) vector of X. 

The (conditional) expectation of a random vector x (given Y) is denoted by 
E[x] (E[x1y1). The notation x~N(x, X) means that a random vector x has 
Gaussian distribution with mean x and covariance X. Statements with "a.s." 
imply that they hold with probability 1. 

Symbols with subscript or superscript "U" ("0") pertain to algorithms for 
UEFC (OEFC). 


III. Problem Statement 

We consider a standard finite-stage discrete-time linear stochastic con- 
trol problem with a quadratic performance index. The system dynamics and mea- 
surement relations are described by 

x(k+ 1) = Ax(k) + Bu(k) + D £;(k) (1) 

y(k+l) = Cx(k+l) + n(k+l), k = 0,l,***,N-l (2) 

where the state x(k) , the control u(k) , the measurement y(k) and the plant 
noise C(k) are vectors of dimensions n, m, £ and r, respectively. The matrices 
A, C and D are of appropriate dimensions and are assumed to be known. The n x m 
control matrix B is a random matrix^ with 

b ~N(E, P^), b = rs(B) 

The other primary random variables are 

x(0) ~N(xq, Pq) 

C(k) ~N(0, Q(k)) 

n(k) ~N(0, R(k)), R(k)>0 

^(k) and n(k) are mutually independent white noise sequences, and both are inde- 
pendent of b and x(0); b and x(0) are also mutually independent. 


^For simplicity of derivation, we assume that B is a constant matrix. 

The extension of our results to the case with linearly varying B as in [6] is 
straightforward. 
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The performance measure we wish to minimize is given by 



N-1 


N-1 / 

J E E 

1 J(k) 

E E 

1 (x(k + 1)^ S(k+ 1) x(k + 1) +u(k)^ A(k) u(k)i 


k=0 


k=0 ^ ^ 


where S(k+1) >0 and A(k) >0. Admissible control laws are causal; i.e., 

u(k) ^ u(k, Y(k), U(k- D) 

where Y(k) = {yd). • • y(k)} and U(k - 1) e (u(0) , • • • . u(k - 1) } . u^O) 

must be a function of prior information on the system. 


IV. Estimation 


Since the system equations (1) and (2) are linear in the random vector 
x(k) and random matrix B, Kalman filter theory can be applied to modified 
system equations to obtain the optimal minimum variance estimates. 


Applying Lemma A.l in the Appendix, we get 

Bu(k) = I Bu(k) = [l ®u(k)^]b 
n n 


(A) 


We can write the following system equations for the augmented state vector 
z(k)^ E (x(k)^b^) 


where^ 


z(k + l) = F(k) z(k) + G^(k) 
y(k + l) = Hz(k+1) + n(k+ 1) 



A I ®u(k)^ 


D 

F(k) = 

n 

, G E 



0 I 


0 


nm.n nm 


nm,r 

H E 

n 

o 



Ji.nm j 



(5) 

( 6 ) 

(7) 

( 8 ) 


Application of Kalman filter theory to the linear equations (5) and (6) 
yields the following optimal minimum variance estimate: 


^If we arrange the vectors of B columnwise we obtain augmented system 
equations of the same form as (5) - (8), except that F(k) is given by 

u(k)^ ® I 

n 
I 

L nm.n nm 

T T T 

The augmented state vector for this case is z(k) s (x(k) where 

b^ E cs(B). The row string arrangement in (5) - (8) is preferred in order to 

facilitate backward optimization (see Section V) . 


F(k) E 
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(9) 




where 


2(k + 1 jk + 1) 

K(k+ 1) 

P(k + ilk) 
P(k + l[k + 1) 


F(k) 2(klk) +K(k+ 1) Iy(k + 1) -H F(k) g(k|'k)] 
P(k+ l|k + 1) a'^[HP(k + l| k) H'^+R(k)]‘i 

F(k) P(k|k) p'^(k) +G Q(k) 

[I -K(k + 1) H] P(k + l|k) 
n 



1 

“ 

* 

xq 

. P(0|0) = 

Po 

0 

n,nra 



0 

P, 



nra,n 

b 

J 


z(k|k) 5 E[z(k) jY(k)], 2(k+llk) = E[z(k+l)lY(k)] 
P(k|k) = E[{z(k) - z(k|k) }{z(k) - 2(kjk) }^1 Y(k) ] 


F(k) 2(kjk) and 


( 10 ) 

( 11 ) 

( 12 ) 


(13) 


P(k+l|k) = E[{z(k+1) -2(k+l|k)}{z(k + l) -2(k+llk)}^lY(k)] (14) 

We partition 2(1 |k) and P(ljk) as 


l&dlk)' 


TTl(ljk) TTsdjk)^ 

2(l|k) = 

. P(ilk) = 

TT3(i|k) TT2(i|k) 

[b(i|k). 



where x(i)k) is an n-dlmenslonal vector, and iTi(ijk) and ir 2 (ilk) are n><n 
and nm X nm matrices, respectively. 


V. Feedback Control Laws 

It is well known that the control laws which solve the optimization problem 
are the formal solutions of the functional equation [2] 

J* = Min J , k » N - 1, • • •, 0 (16) 

u(k) 

where 

= E[j(k) +J*^jY(k)], 

However, v'.losed form solutions of the backward optimization are not available, 
and various suboptimal schemes have been proposed (see, for example, [3] for a 
survey of such schemes). Some of the schemes [4, 6] require a considerable 
amount of on-line computation at each stage k. We derive here two feedback 
laws which do not require lengthy on-line computations. Ti. two laws are 
obtained by carrying out the backward optimization (16) approximately. In the 
following derivations of the control laws, the time Indices will be dropped for 
brevity when there is no ambiguity in notation. 

V. 1 Control Law Based on Underestimating Future Control Efforts (UEFC) 

This control law is derived by underestimating the effects of future con- 
trol. The backward "sub-optimization" proceeds as follows: 


5 



Last Stage; k ~ N - I 


Since " 0, it is easy to obtain the quadratic cost-to-go functional 
“ E[x(N)’^ S(N) x(N) +u(N - 1)'^ A(N- 1) u(N - 1) |Y(N - 1)] 

= u(N - 1)^ A(N- 1) u(N - 1) + tr{S(N) E[B u(N - 1) u(N - 1)'^ Y (N - 1) ] ) 

+ 2 tr{A^S(N) E[Bu(N- 1) x(N- 1)’^|y(N- 1)] } + a(N- 1) + g(N- 1) (17) 

where 

a(N-l) = tr{A*^S(N) AE[x(N- 1) x(N- 1 )'^|y(N- 1)]} (18) 

8(N-1) r tr[D*^ S(N) D Q(N- 1)] (19) 

are independent of u(N-l). 


Recalling (A), we can rewrite the second and third terms as 

trlSE^B uu'^ b'’‘|y] = tr{S(l^ ® u"^) E[b b^jY] (I^®u)} 

= tr{S(N)[l ® u(N- 1)^] Mo(N - IIn - 1) [I ®u(N-l)]} 
n ' n 

trlA*^ SE[B » x’^Iy] } = tr{A’^ S[I^® E[b x^|y] } 

= trlA*^ S(N) [l^® u(N- 1)^] M 3 (N- 1|n - 1)} 


where the ' s are defined by 


M(i|k) = 



x(i) x(i)^ x(i) b^ 

- 

= E 

T T 

Y(k) 


b x(i)^ b b^ 



[Madlk) M2(i!k) 

Tij (ijk) +x(i|k) xdlk)*^ Tr^dlk)"^ + x(i|k) 6(ijk)^'' 

_Tr 3 (i|k) +b(i|k) x(i|k)^ ir^djk) + 6(i|k)b(i|k)^ 
Applying Lemma A. 2 to (20) and (21), we have 


tr{S(I ®u’^)M 2 (I ®u)} = cs(I ® u)^ (S ®Mo) cs(I » u) 

- u(N- l)^[r'^(S(N) »M 2 (N- ijN- l)r] u(N- 1) 

tr{A*^ Sd^® u^) M3} = tr{M3 A^S(I^®u^) } 

- [cs(M 3 A^S)]"^ cs(I^® u"^) 

“ {r"^ cs[M3(N-1|n- 1) a'^S(N)]}^ u(N- 1) 

where the following identity was used to obtain the final expressions: 

cs(I ® u) = r u 
n 

r^=[io 10 i,---,o II 

m m,nm m m,nm m m,nm m 


( 20 ) 


( 21 ) 


( 22 ) 


(23) 


( 24 ) 
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Note that F is an n^m x m matrix. 

Thus, (17), (23) and (24) yield 

" u(N - 1)^[A(N - 1) +0(N - 1)] u(N- 1) 

+ 2w(N - 1)"^ u(N- 1) + a(N- 1) + B(N- 1) (26) 

where 

Q(N-l) 5 r'^[S(N) ®M 2 (N- ijN- 1)] r (27) 

w(N-l) s r^cs[M 3 (N- 11n- 1) A^S(N)] (28) 

Therefore, the optimal control law u*(N - 1) and the associated cost-to-go are 
given by 

u*(N - 1) = - [A(N- 1) +0(N- i)]‘l w(N - 1) (29) 

J*_l = -w(N- 1)'^ [A(N- 1) +0(N- l)]‘l w(N- 1) + a(N- 1) + 6(N- 1) (30) 

Note that 0(N -1)^0 a.s. , since S(N) ^ 0 and M 2 (N ~ ijN - 1) >0 a.s. (see 
Lemma A. 3 in the Appendix). Hence A(N - 1) + 0(N-1) >0 and invertible a.s., 
since A(N - 1) > 0. 


Stage k = N - 2 


The functional relation (16) yields 
Jjj_2 = E[J(N- 2 ) +J*_j1y(N-2)] 

- E[-w(N- 1)^{A(N-1) +0(N- 1)}"1 w(N- 1)|Y(N- 2)] 

+ E[j(N-2) +a(N-l)lY(N-2)] + 8(N-1) (31) 

Since Y(N - 1) = {Y(N - 2) , y(N - 1) } , from (18) 

E[a(N- 1) |y(N- 2)] = E[E{x(N- l)'^A^S(N) Ax(N- 1)|Y(N- 1 )}|y(N-2)] 

= E[x(N - 1)^ A^S(N) Ax(N- 1 )|y(N-2)] 

Therefore, it is straightforward to obtain 

= E[J(N-2) +a(N- 1 )|y(N-2)] + 8(N - 1) 

= u(N - 2)^[A(N - 2) +0y(N - 2)] u (N - 2) 

+ 2 Wy(N - 2)”^ u(N - 2) + Oy(N - 2) + 6^(N - 2) (32) 

where 

0y(N-2) s r^[Vy(N-2) ®M2(N-2|H-2)] r (33) 

Wy(N-2) E cs[M3(N- 21n- 2) A^ Vy(N- 2)] (34) 

Oy(N-2) E tr{A^ Vy(N- 2) AE[x(N- 2) x(N-2)^iY(N-2)]} (35) 
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(36) 


- 2) B(N - 1) + tr[D"^ S(N- 1) D Q(N- 2)] 

Vy(N- 2) - S(N- 1) + A^S(N) A (37) 

The difficulty in optlrcization lies in evaluating the first term in (31), 
since 0(N - 1) and w(N - 1) are complicated random matrix and vector, respective- 
ly, depending on u(N-l). In this control law the term is neglected in order 
to simplify the backward optimization. Note that the term is nonpositive a.s., 
since A(N - 1) + 0(N - 1) > 0 a.s. This term originates from the first two 
terms in (26) (with the optimal law u*(N-l) in (29)), and accounts for the 
amount of reduced cost due to the control at stage N-1. Hence the omission 
of this term means that the control law at N - 2 is designed by neglecting the 
control effect at N-1 (E [a(N - 1) jv (N - 2) ] accounts for the cost due to the 
free motion from N-1 to N) . Although this approximation may seem somewhat 
ad hoc, the resulting control law require#, little on-line computation and shows 
good performance in the simulated examples, as will be observed in Section VI. 

With the above simplification, we have the control law u^(N-2) which mini- 
mizes (- ) and the associated cost-to-go functional 

Uy(N-2) » - [A(N- 2) + 0y(N- 2)l'^w^(N -2) (38) 

J*_2 ^ '^N-2 “ -Wu(N-2)'^[A(N-2)+Gj^(N-2)]'lwy(N-2) 

+ a^(N- 2) + 8^j(N- 2) (39) 


Algorithm for UEFC 

By proceeding with the simplification described for stage N-2, we obtain 
the control law for a general stage k: 


u^(k) = - [A(k) +Gy(k)l"^ Wy(k) 


where 


0u(k) H r'^[Vy(k) »M2(k|k)] r > 0 
Wy(k) = r’’^cstM3(klk) A'‘^V^,(k)] 

V^(k) - S(k + l) + A''^Vy(k+ 1) A, 

Vy(N) = 0 


a.s. 


N-1, 


0 


(40) 

(41) 

(42) 

(43) 


and r and M^(kjk) are defined by (25) and (22), respectively. 
Remarks: 


1. Since V^(k) can be computed off-line by (43), this control law requires 
no on-line recursive computation, but computation of only 0y(^) w^(k) 

to obtain u^(k). 
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Note that Gjj(k) and wyCk) are functions of Tt^Ckjk) and ■n 3 (k|k), measures 
of estimation error, as well as k(k|k) and b(k|k) (see equation (22)). In 
this sense UEFC is cautious like OLOF l3]. 

3. As mentioned above for stage N-2, A(k) + Gy(k) > 0 and invertible a.s., 
hence (40) provides a well-defined control law a.s. 

V.2 Control Law Based on Overestimating Future Control Efforts (OEFC) 

S tage k » N - 2 

The UEFC law was obtained by neglecting the term due to the control efforts 
at stage N - 1 because of the difficulty in approximating the term in a simple 
manner. Here we bound the term, the first (negative) term in (31), from below, 
thereby obtaining a control law (OEFC) by overestimating the control efforts at 
stage N - 1 . 

Lemma 


The first term in (31) can be bounded as 


-Wy(N- 1)^[A(N - 1) +0(N- l)]"^ w(N- 1) 

2 - tr{[A(N - 1) +0(N - 1)]*^ 0(N- l)}o(N - 1) 


(44) 


Proof: 


Using S(N) 2 0 and M(N - 1 |n- 1) > 0 in Lemma A. 4 in the Appendix, we have 

S(N) ®Mi(N- 1|N- 1) S(N) ®M3(N- 1|N- 1)'*'^ 

S(N) ®M3(N- 1|N- 1) S(N) ®M2(N- 1|N - 1) 


2 0 


(45) 


We define 


tr{A^ S(N) AMj(N- 1|N- D) fcs{M 3 (N- 1|N- 1) A^ S(N)}] 


cs(M3(N- 1|N- 1) A^S(N)} 


S(N) ®M2':n - 1|N - 1) 


(46) 


then 


a(N- 1) - tr[A^S(N) AMi(N- 1|N- 1)] - tr(Mi a’^ S A) 

- tr(SAMiA’’) - [cs(A^)]'^ (S «Mi) cs(a'^) 
where Leimna A. 2 was used to obtain the last equality. Also from Lemma A.l 
cs[M3(N- 1|N- 1) A^S(N)] - (S»M3)cs(A^) 


Therefore, 
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and an application of Lensma A. 4 to (46) yields 

cs[M 3(N-1|N-1) A'^r(N)]{cs[M3(N-lj..-l) A^S(N)]}^ s a(N-l) [S(N) ®M2(N-1 jN-l) ] (47) 
Thus, from (28) 

w(N ~ 1)^[A(N - 1) +0(N - 1)]“1 w(N- 1) 

- tr{(A+0)‘l r’^cs(M 3 a'^S) (cs(M 3 A^S)]^ r} 

< tr{(A+0)‘l r'*'a(S®M 2 ) r} 

- tr{[A(N - 1) + 0(N- 1)]"^ 0(N- 1)} a(N - 1) 

where (47), A + 0 > 0 a.s. and Lemma A. 5 were used to obtain the Inequality. 
This completes the proof. 

Using the above Lemma and (31), we have a lower bound for ^ 

Jjj_2 s ■ E[tr{[A(N - 1) +0(N - 1)]"^ 0(N- 1)} a(N - 1) 1 y(N -2)] (48) 

wherr 0(N“1) and a(N-l) are random matrix and variable, respectively, given 
Y(N-2), and no simple expression is available for the second term. As can be 
observed in (27), 0(N - 1) is a function of M2(N-1|N-1), the estimate of bb^ 
(a constant random matrix) at N-1. In order to proceed with the analysis in r. 
simple manner, 0(N-1) is replaced by it£. estiniate 

0(N- 1[n-2) = E[0(N- 1)|y(N- 2)] « r'^[S(N) ®M 2 (N- 2 |N- 2 )] T (49) 

which is a function of Y(N-2). Therefore, (48) is approximated by 

J °_2 = 4-2 " tr{[A(N-l) +0(N-I|N-2)]‘1 0(N-1 |n- 2) E(a(N-l) iY(N-2)]} (50) 

For (32)- (37) and (50), we have the following cost-to-go expression for OLFC: 

4-2 “ u(N-2)’^[A(N-2) +0q(N-2)1 u(N-2) 

+ 2Wq(N- 2)’^u(N-2) + ap(N-2) + Bq(N-2) (51) 

where 
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Oq(N- 2) : r'^[VQ(N -2) ®M;,(N -2lN -2)] r (52) 

Wq(N-2) . cs[M3(N- 2|N- 2) a'*’ V^(N- 2)] (53) 

«q(N- 2) = tr[A’^VQ(N- 2) AK,(N- 2 |n- 2)] (54) 

Sq(N- 2) = 0y(N-2) 

Vq(N - 2) s S(N - 1) + c (N - 1 |N - 2) a'^ S(N) A (55) 

€(N-1|N-2) - 1 - tr{[A(N- 1) +6(N- ljN-2)]'l Q(N- 1|N- 2)} (56) 

Therefore, the control law OEFC which minimizes ‘^^^2 given by 

Uq(N-2) - > [A(N- 2) + 0q(N-2)]"Uq(N-2) (57) 

*^N-2 “ + 2)]-^ Wq(N -2) 

+ q<q(N- 2) + Bq(N- 2) (58) 

A’ M ithm for OEFC 


Since the expression (58) for - has the same quadratic form as (39) 
for 2’ easy to obtain the OEFC control law for a general stage k: 



UQ(k) = - tA(k) + 0Q(k) ] ‘ ^ Wp(k) 

(59) 

where 

QqW 1 r'^[V^(k) «M 2 (k|k)] r 

(60) 


w^(k) r cs[M 3 (klk) a'^ V^(k)] 

(61) 

The matrix 

Vq(R) is computed by the following (backward) recursive formula 


V(ilk) 

- S(l + 1) + €(i + l|k) A^V(i + l|k) A i » N - 1, N-2, • • •, ^ 

(62) 


Vp(k) 5 V(k!k). V(Njk) £ 0^^^ 

(63) 


€(i + llk) = 1 - tr{[A(i + l) +0(i + l|k)]-i G(l+l|k)} 

(64) 


0(i + llk) £ r^[V(l+ 1 jk) ®M 2 (klk)] r 

(65) 

Remarks: 



1. The OEFC algorithm has the same structure as the UEFC law given by (40) 

(43), where e(l + ljk) 5 1 (compare (43) with (62)). 

- 


2. The OEFC law requires more on-line computation than the UEFC law, since 
0(1 + l|k) depends on M 2 (k|k) « Elbb^|Y(k)] and (62) must be recursively 
computed for each stage k. 
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VI . Examples 


A f'omputer simulation study was performed to evaluate the performance of 
the UEFu and OEFC control laws. The two systems selected are single-input third 
order systems, and are essentially tha same as those in [ 6 ]; one is a stable 
system and the other is an unstable system. The performance of the laws for 
Monte Carlo runs is statistically compared with the certainty equivalence law 
(CE) and the optimal control law when 5 is known (called the LQG algorithm — 
the solution of the standard LQG problem). The sensitivity of performance of 
the four algorithms is studied for various levels of initial uncertainties 
(Pj, and Pg) . 

The system matrices common to the two systems are 
C = [1 0 0] , = [0.2 0.4 0.6] 

xg = [1 1 1], Q(k) » 0.01, R(k) = 0.09 
S(k + 1) = I3, A(k) = 1 

We simulate 20 stage (N = 19) processes, and compute the sample mean M and 

N-1 

standard deviation Sj of the performance measure ^ J(k) for 20 Monte Carlo 
runs . k =0 

VI . 1 . Stable System 

The system matrices are given by 



1 

0.2 

0 . 0 ' 


‘ 0 . 0 ' 

A = 

0 

-1 

1.0 

-1.4 

0.2 

0.4 

, B = b = 

0.0 

-0.4 


where A has eigenvalues 0.8 and 0.8±0.4j. The performance of the four algo- 
rithms (UEFC, OEFC, CE and LQG) for the Monte Carlo runs is plotted in Figure 1 
(sample mean Mj) and Figure 2 (sample standard deviation Sj) for Pg = 4 I 3 

and different The abscissa in the figures is o^, where P^ = o^Ig. 

For each of the 20 runs, B = b and x(0) are randomly generated by the distribu- 
tions b ~N(b, P^) and x(0) ~N(xg, Pg) . Similarly, Figures 3 and 4 show the 

dependence of Mj and Sj on various og's, where Pg = og I 3 , for P^ = 4 I 3 . 

In order to see the normalized performance of the suboptimal laws, the ratio 

for a suboptimal law 
J " Mj for the LQG law 

is plotted in Figures 5 and 6 for various and og's, respectively. 
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Observations: 


1. The performance of UEFC and OEFC remains almost the same as Oj^ increases, 
whereas the CE performance becomes considerably worse (Figures 1, 2 and 5). 
This is to be expected, since both UEFC and OEFC take the errors of esti- 
mates into consideration and are cautious in implementing control, while CE 
does not consider such uncertainty (see Remark 2 following equation (43)). 

2. The normalized performance of the three suboptimal laws is rather insensi- 
tive to variations in Pq; however, tj decreases slightly as increases 

(Figure 6 ). This is because the uncertainty in xg (Pg) » which is common 
to the four laws (Including the LQG law) , becomes comparatively more domin- 
ant than the uncertainty in t*(P|j ~ 4 I 3 ) as og Increases, and as a result 
the performance degradation due to unknown B tends to decrease. 

3. Considering that the performance of the LQG Icw is impossible to attain and 
that the optimal law with unknown B is worse than the LQG law (the optimal 
law with known B), the performance of the UEFC and OEFC laws (rj - 1.5-3, 
Figures 5 and 6 ) is good, especially since little on-line computation is 
required . 

In order to study further the characteristics of the UEFC and OEFC laws, the 
time histories of the four laws for a representative run are plotted in 


Figure 7: 

Control 

u(k) 


Figure 8 : 

Estimate 

b(k|k) = 

[bi 62 63 ]^ 

Figure 9-12; 

Estimate 

k(k[k) E 

[Rl &2 ^ 3 ]’^ 

Figure 13; 

Instantaneous cost 

J(k) 

For this run P^^ = Pg = 4 I 3 , 

the true 

values of 

B and x(0) are 

B^ = [0.54 -2.07 

-3.42], 

x( 0 )^ = 

[1.19 3.65 5.5( 


N-1 

and the performance measure I J(k) is 404, 787, 880, and 4301 for the LQG, 

k =0 

UEFC, OEFC, and CE laws, respectively. 

Observations ; 

The characteristics of the three suboptimal laws are clearly shown in these 
figures. The CE law erroneously exerts large control in the beginning (k = 0-5 
in Figure 7), thereby incurring large costs (Figure 13). The large control acci- 
dentally results in fast learning of B (Figure 8 ), and less cost J(k) than the 
UEFC and OEFC laws at later stages (ka7). Both UEFC and OEFC are cautious and 
very little control energy is implemented in the beginning (k £ 7 in Figure 7) , 
when larger estimation errors are expected (see Remark 2 following equation (43)). 
Since UEFC underestimates future control efforts, it is less cautious than OEFC 
and exerts more control at k = 8 - 14 than OEFC, thereby attaining better cost 
(Figure 13) and better estimate b(k|k) (Figure 8 ). Note that the estimation of 
x(k) for UEFC and OEFC is very good (compare Figures 10 and 11 with Figures 9 and 
12), although the estimate b(k|k) is not as good as CE. 
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VI. 2. Unstable System 


The system matrices are given by 



1 

0.2 

0 . 0 ’ 


■ o . o ‘ 

A = 

0 

1.0 

0.2 

, B = b * 

0.0 


1 

- 0.6 

0.8 


- 0.2 


where A has eigenvalues 1.2 and 0.8 ± 0.4 j. As for the stable system, the 
performance of the four algorithms for 20 Monte Carlo runs is plotted in 


Figure 

14 

Sample mean Mj for various 

P,^'s 

b 




Figure 

15 

Sample standard deviation 


for 

various 


Figure 

16 

Sample mean Mj for various 

Po's 




Figure 

17 

Sample standard deviation 


for 

various 

Po's 

Figure 

18 

Normalized sample mean r^ 

for various 

"b 

's 

Figure 

19 

Normalized sample mean r^ 

for various 

Po 

's 


The time histories for a representative run are plotted in 

Figure 20: Control u(k) 

Figure 21: Estimate b(k|k) 

Figure 22; Instantaneous cost J(k) 

f 

where = Pq = 4 I 3, the true values of B and x(0) are B = 

[-1.90 1.50 -2.07] and x(0)^ = [0.19 1.76 0.37]; and the performance measure 

N-1 

Y J(k) is 64, 471, 708, and 4565 for the LQG, UEFC, OEFC, and CE laws, respec- 

k=0 

tively. 

Observations ; 

1. The characteristics of the three suboptimal laws are very similar to those 
observed for the stable system. 

2. The performance of the OEFC law is somewhat worse than that in the stable 
case, whereas the UEFC law performs consistently well (Figures 14 - 19) . 

The CE law performs better than the cautious OEFC and UEFC laws for small 
Pb (ob = 0.1 and 0.3; i.e., when there is little uncertainty in b). 

3. Figures 20- 22 illustrate the characteristics of the three laws more clearly 
than the stable case (see Observations for Figures 7 - 13) ; the large control 
efforts at early stages for the CE law cause large cost J(k) and acciden- 
tally fast learning of B (Figure 21), which results in small cost at later 
stages (Figure 22) . The UEFC law is less cautious than the OEFC law and 
its peak control efforts are implemented earlier (k = 7-10 in Figure 20) 
than the OEFC law (k = 10-16), resulting in better overall cost and esti- 
mate S(k[k). Note that the peak of J (k) is also earlier for the UEFC 

law (k = 9-12 in Figure 22) than for the OEFC law (k = 13 - 18) . 
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VII. Conclusions 


We have considered a discrete-time linear stochastic adaptive control 
system with unknown control gain matrix (B) . Two suboptimal control laws have 
been derived: the UEFC law based on the underestimation of future control and 

the OEFC law based on the overestimation on future control. These laws require 
little on-line computation and at the same time incorporate some Information on 
the estimation errors, hence they are in the category of "cautious" controls as 
classified by Wittenmark [3]. Two single-input third order systems have been 
simulated to compare the Monte Carlo performance of the laws with that of the 
CE and LQG laws. The dependence of the performance of the four laws on Pj^ and 
Pq (the initial uncertainties on the state x and the control gain B) has been 
studied. The results indicate that the UEFC and OEFC laws perform much better 
than the CE law with only a little extra computation being required. 

Admittedly, the class of systems considered in this study is small. How- 
ever, the UEFC and OEFC laws derived for this class are conceptually simple and 
computationally efficient, and may provide a suitable framework for treating the 
more general class, where the system matrix (A) as well as the control gain 
matrix (B) are unknown. Further research is envisaged in this direction. 
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Appendix 

The identities and inequalities used to derive the estimation and control 
laws in the preceding sections are collected and proved where necessary. The 
matrices involved in the following lemmas are assumed to be conformable. 


Lemma A. 1 

cs(ABC) * (C^®A)cs(B) (Al) 

rs(ABC) = (A«C) rs(B) (A2) 

Lemma A. 2 

tr(AB) = cs(A) cs(B) (A3) 

tr(AC^BC) = cs(C)^ (A»B^) cs(C) (A4) 


For the proofs of (Al), (A3), and (A4) , the reader is referred to [8]. The 
identity (Al) is due to Nissen [9]. The proof of (A2) is straightforward and is 
omitted. 

Lemma A. 3 

If A > 0 and B > 0, then A » B > 0 (A5) 

If A > 0 and B > 0, then A ® B > 0 (A6) 

Proof ; 

Since A and B are symmetric, A ® B is symmetric. The eigenvalues of 
A ® B are X^vj, where Xi and Uj are the eigenvalues of A and B, respec- 
tively [10, p. 235]. Since A > 0 and B > 0, X^ S 0 and pj > 0, hence 


X^Pj 2 0 Vi, j 

This implies that A ® B 2 0. The proof of (A6) is similar. 


Lemma A. 4 

If A 2 0 and 
dimensions m and i, 


B 


Bi 

1^3 


Ti 


B3 
B2 j 

respectively. 


> 0, where 
then 


Bi and B 2 are square matrices of 



Bj 

, - B3 Bf 1 63^ 

> 0 

(A7) 

and 

C = 

r Tl 

A ® B J A ® B3 

2 0 

(A8) 



A ® B3 A ® B2 



If B 2 0 and Bj 

is a scalar, then 





T 

Bi B2 2 B3 B3 

(A 9 ) 
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Proof : 


Since B > 0 , 

Bi > 0 

and invertible. 


8 » 

I 

m 

®m,Jl 




Bf!B 3 ^ 

83 Bf 






which implies that 

®2 ~ ®3 ® 

63^ > 0. 


For the 

case 

8 > 0, 

(A 9 ) clearly holds if 

Bi = 0 . If B 


> 0 


82-8381"^ 83 5 0 , which implies (A 9 ) . 

To prove (A8) we assume that A is an n-dimensional matrix and let 
A “ A + e Ijj; then from (A6) A ® 8j >0 and is Invertible, since A > 0 
and 8j > 0 . Therefore, 

IT* 

A » 81 A » 83 
1 _A ® 83 A ® 82 , 

A ® 81 


D 


nm,n£ 

°nJl nm ^ -(A ® 83) (A » 81) " ^ (A ® 83^) 


(AlO) 


where 


D = 


nm 


[(A®B3)(A®8i) 


-1 


nm,nJl 

^nJl 


Using identities for inverses and products of Kronecker products [8] , we 
can easily write 

(A® BsHA® Bi)“^ “ ^n ® 

m ^ 1 T 

A » 82 -(A®B3)(A® Bi)-^A® 83 ) = A® (82 - B3 Bf 1 83 ) 


Therefore, from (AlO) 

I 0 


C = lim C = 
e ^0 


nm nm,n£ 

I ®B 3 Bfl I 


n 


n£ 


A® Bi 


nm,nJl 


m 

I I ®Bf^B3 
nm n ^ 


nit,nm 


n£ 


(All) 


0 - A® (82-6381"! 83 ) 
n£,nm ^ ^ 

T 

From (A 5 ) and (A 7 ) , A ® Bi > 0 and A® (82-6381"! 63) > 0 , hence (All) implies 
C > 0 . 

Lemma A . 5 


If A > 0 , 8 > C > 0 , then 

tr((A + B)"!B] > tr[(A+C)"!c] 


(A 12 ) 
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Proof : 


tr[(A + B)-i B] - tr{(A+B)-l[(A+ B) - A]} 

« tr(I^) - tr[(A+ B)“^ A] 

Now tr[(A + B)~^A] » tr fA*^(A + B) " ^ A'*] and since B S C > 0, 
and 

A^(A + B)~^A*^ S A^(A+C)'^A*^ 

Therefore, tr[(A + B)"^A] S tr[(A+ C)"^ A] . Consequently, 

tr[(A + B)"^ B] ■ tr(I ) - tr[ (A+ B)"^ A] 5 tr(I )-tr[(A + C)"^ 

n n 


(A + B)'l < (A + C)'^ 


A] tr[(A+ C)'^ C] 
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Figure 7. - Time hisiory of control u(k) for the 4 control laws. 
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